Configurable collective communication in LAM-MPI

نویسندگان

  • John Markus Bjørndalen
  • Otto J. Anshus
  • Brian Vinter
  • Tore Larsen
چکیده

In an earlier paper, we observed that PastSet (our experimental tuple space system) was 1.83 times faster on global reductions than LAM-MPI. Our hypothesis was that this was due to the better resource usage of the PATHS framework (an extension to PastSet that supports orchestration and configuration) due to a mapping of the communication and operations which matched the computing resources and cluster topology better. This paper reports on an experiment to verify this, and represents an ongoing work to add some of the same configurability of PastSet and PATHS to MPI. We show that by adding run-time configurable collective communication, we can reduce the latencies without recompiling the application source code. For the same cluster where we experienced the faster PastSet, we show that Allreduce with our configuration mechanism is 1.79 times faster than the original LAM-MPI Allreduce. We also experiment with the configuration mechanism on 3 different cluster platforms with 2-, 4-, and 8-way nodes. For the cluster of 8-way nodes, we show an improvement by a factor of 1.98 for Allreduce.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CoMPI – Configurable Collective Operations in LAM/MPI

This paper describes an extension to LAM/MPI[3] which enables the user to configure a subset of the collective operations by using Scheme[5], which is a high level general purpose programming language in the Lisp family. Currently the operations that may be configured are broadcast, reduce, allreduce and barrier, but the system is general enough to be extended with other operations if that is r...

متن کامل

Group Management Schemes for Implementing MPI collective Communication over IP-Multicast

Recent advances in multicasting present new opportunities for improving communication performance for clusters of workstations. Realizing collective communication over multicast primitives can achieve higher performance than over unicast primitives. However, implementing collective communication using multicast primitives presents new issues and challenges. Group management, which may result in...

متن کامل

The Performance of Configurable Collective Communication for LAM-MPI in Clusters and Multi-Clusters

Using a cluster of eight four-way computers, PastSet, an experimental tuple space based shared memory system, has been measured to be 1.83 times faster on global reduction than using the Allreduce operation of LAM-MPI. Our hypothesis is that this is due to PastSet using a better mapping of processes to computers resulting in less messages and more use of local processor cycles to compute partia...

متن کامل

A Case for Non-blocking Collective Operations

Non-blocking collective operations for MPI have been in discussion for a long time. We want to contribute to this discussion and to give a rationale for the usage these operations and assess their possible benefits. A LogGP model for the CPU overhead of collective algorithms and a benchmark to measures it are provided and show a large potential to overlap communication and computation. We show ...

متن کامل

Developing a Thin and High Performance Implementation of Message Passing Interface

Communication library is a substantially important part for the development of the parallel applications on PC clusters. MPI is currently the most important messaging passing standard being used worldwide. Although powerful, MPI is very complex and require a certain amount of effort to learn. In fact, only a basic set of MPI functions is enough to develop a large class of parallel applications....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002